On Efficient and Effective Association Rule Mining from XML Data
نویسندگان
چکیده
In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently and effectively. In XAR-Miner, raw XML data are first transformed to either an Indexed Content Tree (IX-tree) or Multi-relational databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection in the AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized metapatterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent under-generalization or over-generalization. Resultant generalized meta-patterns are used to generate large ARs that meet the support and confidence levels. An efficient AR mining algorithm is also presented based on candidate AR generation in the hierarchy of generalized meta-patterns. The experiments show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.
منابع مشابه
A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملA New Model for Discovering XML Association Rules from XML Documents
The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the disco...
متن کاملAn Efficient Xml Database Mining without Candidate Generation: an Frequent Pattern Split Approach
The popularity of XML results in producing large numbers of XML documents. Therefore, to develop an approach of association rule mining on native XML databases is an important research. The FP-growth based on an FP-tree algorithm performs more efficiently than other methods of association rules mining, but it cannot be applied to native XML databases. Hence, we adaptive an improving FPtree algo...
متن کاملA Framework for Efficient Association Rule Mining in XML Data
In this paper, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document are first preprocessed to transform to either an Indexed XML Tree (IX-tree) or Multi-relational Databases (Multi-DB), depending on the size of XML document and memory constraint of the system, for efficient data selection and AR mining. Concepts that...
متن کاملMining Association Rules from Structural Deltas of Historical XML Documents
Previous work on XML association rule mining focuses on mining from the data existing in XML documents at a certain time point. However, due to the dynamic nature of online information, an XML document typically evolves over time. Knowledge obtained from mining the evolvement of an XML document would be useful in a wide range of applications, such as XML indexing, XML clustering. In this paper,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004